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Abstract 

In nonparametric statistics an optimality criterion for estimation 
procedures is provided by the minimax rate of convergence. However 
this classical point of view is subject to controversy as it requires to 
look for the worst behaviour reached by an estimation procedure in a 
given space. The purpose of this paper is to show that this is not jus- 
tified as the minimax risk often coincides with a generic one. We are 
here interested in the rate of convergence attained by some classical 
estimators on almost every, in the sense of prevalence, function in a 
Besov space. 
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1 Introduction 

Since its introduction in the seventeen's, nonparametric estimation has 
taken a large place in the work of mathematical or signal processing 
communities. Often a signal has too many components, in the case for 
instance of densities or curve images, to allow classical studies upon 
finite dimensional spaces to give accurate results. But which kind of 
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estimator can be more appropriate in nonparametric cases? 

This question raised a lot of definitions and discussions in statis- 
tical community. How can two estimators be compared when they 
point out infinite dimensional objects and what kind of optimal be- 
haviour can be expected. One of the most common way to test the 
performance of a procedure is to compare its convergence rate with 
an optimal one given by minimax theory. Nonetheless, this technique 
comes from a particular definition which can be subject to controversy. 
The main drawback is the pessimist point of view of this theory, which 
looks for the worst rate of estimation obtained in a given space. In 
this paper, we introduce a new test of the risk, obtained thanks to 
genericity theories, and which shows that the minimax rate of estima- 
tion should not be as pessimistic as believed. 

In the minimax paradigm one supposes that a function / belongs 
to a certain space 6, which can be, for instance a Sobolev or Holder 
space linked to some regularity properties and one defines a risk, or 
loss function thanks to a pseudo-distance on 0, denoted R(., .). Given 
a radius C > and an estimation procedure " n , depending on the 
model and of a data parameter n, the maximal risk of" n on &c is then 
defined by: 

R n ( n ) = sup E(i?(/ n ,/)), (1) 
/ee c 

where 9c is a closed ball in with radius C > 0. 

If T n denotes the set of all estimation procedures defined on the 
minimax risk on ©c* is given by : 

R n (@)= inf sup E(R(f n ,f)). (2) 
fer n fee c 

This minimax risk gives an optimal bound over the function class 
0C*. It is thus natural for estimation procedures to attempt to reach 
this risk, at least asymptotically when n tends to infinity. 

The main drawback of the minimax theory is that we are looking 
for the maximum risk on a function space, thus for the worst be- 
haviour. This point of view seems pessimistic and can be not generic 
enough as it can be used to merge estimation procedures. Indeed, the 
worst case could be a misleading one and a method can be rejected 
although it is a good one for a lot of functions. The purpose of this 
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paper is to show that it is not the case and that in fact minimax 
rate corresponds to a generic one. The purpose of this paper is thus 
twofold. We introduce a new test of estimation performances, based 
on generic properties. Thanks to this definition, we show that the 
minimax risk coincide with an "almost every" one. 

Let us first introduce what is meant by almost every function. In 
a finite dimensional space, we say that a property holds almost every- 
where if the set of points where it is not true is of vanishing Lebesgue 
measure. The Lebesgue measure has here a preponderant role, as it 
is the only cr-finite and translation invariant measure. Unfortunately, 
no measure shares those properties in infinite dimensional Banach 
spaces. A way to recover a natural "almost every" notion in infinite 
vector spaces is thus defined as follows by J. Christensen in 1972 see 
[2, 4, 12]. 

Definition 1. Let V be a complete metric vector space. A Borel set 
A C V is Haar-null (or shy) if there exists a compactly supported 
probability measure fx such that 

Vz G V, n(x + A) = 0. (3) 

If this property holds, the measure fx is said to be transverse to A. 

A subset of V is called Haar-null if it is contained in a Haar-null 
Borel set. The complement of a Haar-null set is called a prevalent set. 

As it can be seen in the definition of prevalence, the main issue 
in proofs is to construct transverse measures to a Borel Haar-null set. 
We remind here two classical ways to construct such measures. 

Remarks. 1. A finite dimensional subspace of V , P, is called a 
probe for a prevalent set T C V if the Lebesgue measure on P is 
transverse to the complement of T . 

Those measures are not compactly supported probability mea- 
sures. However one immediately checks that this notion can also 
be defined the same way but stated with the Lebesgue measure 
defined on the unit ball of P. Note that in this case, the support 
of the measure is included in the unit ball of a finite dimensional 
subspace. The compactness assumption is therefore fulfilled. 

2. If V is a function space, a probability measure on V can be de- 
fined by a random process Xt whose sample paths are almost 
surely in V. The condition fj,(f + A) = means that the event 
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X t — f £ A has probability zero. Therefore, a way to check that a 
property V holds only on a Haar-null set is to exhibit a random 
process X t whose sample paths are in V and is such that 

V/ G V, a.s. Xt + f does not satisfy V . 

The following results enumerate important properties of prevalence 
and show that these notions supply a natural generalization of "zero 
measure" and "almost every" in finite-dimensional spaces, see [2, 4, 
12]. 

Proposition 1. • If S is Haar-null, then Vx € V , x + S is Haar- 
null. 

• If dim(V) < oo, S is Haar-null if and only ifmeas(S) = (where 
meas denotes the Lebesgue measure). 

• Prevalent sets are dense. 

• The intersection of a countable collection of prevalent sets is 
prevalent. 

• If dim(y) = oo, compact subsets ofV are Haar-null. 

As we can see from the properties of prevalent sets, this theory 
provides a natural generalization of the finite dimensional notion of 
almost every. Since its definition, this theory has been mainly used in 
the context of differential geometry [12] and regularity type properties 
[11]. A classical example is given in [11], where it is proved that the 
set of nowhere differentiable functions is prevalent in the space of 
continuous functions. 

Using this theory, a natural way to exhibit a risk for an estimating 
procedure is to look at the risk reached on almost every function of 
0, in the sense of prevalence. 

As the minimax theory has been widely studied, a large class of 
results exist in different function spaces and with different losses. His- 
torically, the first one is the result of Pinsker [21] which shows that 
suitable linear estimators reach the optimal I? risk rate on I? Sobolev 
classes. If the risk function is given by an IP norm, [13, 3] show that, 
under certain conditions, kernel estimators are optimal in the sense 
of minimax theory in the same function spaces. More recent results, 
such as those of [20], stated that linear estimators cannot reach the 
optimal bound in nonlinear regression, as soon as we take the IP risk 
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and Sobolev classes. 

In this paper we focus on Besov spaces and take the general L p 
loss function. The interest of studying Besov spaces is motivated by 
its practical use in approximation theory and its theoretical simplicity 
in terms of wavelet expansions. Furthermore, in the theoretical point 
of view, they also generalize some classical function space, such as 
Holder and I? Sobolev spaces. 

In those Besov spaces we study the performances in terms of generic 
approximation of two classical estimation procedures in both white 
noise model and density estimation problem. A second result gives 
the generic rate of estimation for larger families of procedure in the 
case of the white noise model. 



2 Models and estimation procedures 

In the following, we consider two classical estimation problems. The 
first one is given by the Gaussian white noise model. Following the 
definition of [13], we suppose that we observe Y t such that 

dY t = f(t)dt + -±=dW t , t e (0, (4) 
v ^ 

where dW t stands for the d-dimensional Wiener measure, n is 
known and / is the unknown function to be estimated. 

The second theoretical framework in which our theory can be ap- 
plied is the problem of density estimation. In this case, the model 
is given by a sequence X\, . . . , X n of independent and identically dis- 
tributed random variables of unknown density / on R. 

The estimation procedures that we deal with are defined thanks to 
a base decomposition of the function to be estimated. To define them, 
we first introduce the wavelet decomposition. In our framework, those 
bases allow both to define function spaces and estimation procedures. 
It provide thus a key tool to introduce our results. 

The wavelet transform is a powerful approximation tool largely 
used in statistics applied to signal processing, thanks to its properties 
of localization in time and frequency domains. Indeed, this property 
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allows to reconstruct a signal with few coefficients. Its use in statisti- 
cal communities and the development of wavelet based estimators are 
thus natural, as introduced in [18]. 

To define wavelets, we refer to [6] where it is proved that for r 
large enough there exists 2 d — 1 functions ip® with compact support 
and which belong to C r . Furthermore each ip*- 1 ' has r vanishing mo- 
ments and the set of functions {ip^\ = 2 dj/2 tp^(2 j x - k), j G Z, k G 
Z d , i G {1, 2 d — 1}} forms an orthonormal basis of L 2 (R d ). It is also 
noticed in [19] that wavelets provide unconditional bases of L p (M. d ) as 
far as 1 < p < oo. Taking periodized wavelets allow to restraint our 
properties to [0, l] d . 

Thus any function / G LP can be written as 

/ = £ c S^(*) (5) 

i,j,k 

where 

C W = $jd/2 J f( x )^(2 j x - k)dx. (6) 

We can notice that we stand in isotropic cases. Thus the direction of 
the wavelets is not involved and in the following, we omit the direc- 
tional index i. 

As the collection of {2 d H 2 ^(2?x - k), j G N, k G {0, . . . , 2? - 
l} d , i = 1, 2 d — 1} form an orthonormal basis of L 2 ([0, l] d ), observ- 
ing the whole trajectory of Y t in (4) is equivalent to treat the following 
problem, in which is observed (yj,k)jeN,ke{o,...,2J-i} d e ^ 2 (N d+1 ) such 
that Vj, k, 

Vj,k = Gj, k + ( 7 ) 

where Hi = j ipj,kdY ', vi are i.i.d. Gaussian random variables and 
{9j t k) is the sequence to be estimated. 

In terms of density estimation, one can also notice that the den- 
sity function to be estimated / can be represented in terms of wavelets 
f = Yl @j,kipj,k- In this case, the purpose is to find a sequence (Pj,k)j,k 
approximating (/3 j: k)j,k- 
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Furthermore wavelets are useful as they provide a simple charac- 
terization of Besov spaces. 

Homogeneous Besov spaces are characterized, for p, q > and 
s € R, by 

(\ q/p 
£ \c j , k \ p 2 ( " p - d+ '2 )i < C 

j>o,ke{o,-,2 j -i} d J 

(8) 

This characterization is independent from the chosen wavelet has 
r vanishing moments, with r > s. 

We also denote by Bp] q c (R d ) the closed ball in Bp' q (R d ) of radius 
c> 0. 

In terms of wavelets approximation, or in any base, the most nat- 
ural and classical way to define estimators reachable is given by linear 
estimation. 

Definition 2. Suppose that we stand in the model (7). Linear esti- 
mators f£ are constructed by 

/'w = E§^w. (9) 

where 

*SS = (10) 

Parameters (A^)^ can be seen as smoothing weights depending 
of the problem. Those weights can be of different natures. Classical 
ones are: 

(n) 

• Projection weights: A^ fc = tj <mn . 

• Pinsker weights: = (1 — (^~) a )+> 
where m n is an increasing function of n. 

Definition 3. Suppose that we stand in the model of density estima- 
tion. In this case, a linear estimator of the density f is constructed 
by taking 

1 n 

^=-J2 < ^k,X i > . (11) 
i=l 

And 

fn =£/W,> (12) 
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The localization property of wavelet expansions is such that a given 
signal may have a sparse representation in those bases. Thus a natural 
estimation procedure in the white noise model, defined in [7] and ever 
since widely used in the signal community is to take away small wavelet 
coefficients. This is the principle of wavelet thresholding. 

Definition 4. Suppose that we stand in the case of white noise model 
(7) . The wavelet thresholding procedure is then defined by 

/n r w = EEfc (13) 

3=0 k 

Here the weights are given by: 

f 3 lk = yj,kt{\y jk \> Ktn }, (14) 

in the case of hard thresholding, or 

0j,k = si m(yj,k){\yj,k\ - *t n )+, (15) 

for the soft thresholding. Here, 

stands for the universal thresholding and j(n) is such that 

2 -j(n) < < 2 - j (n)+l ) 

n 

k being a constant large enough. 

Once again, in the model of density estimation wavelet threshold- 
ing is obtained thanks to a slight modification of the previous defini- 
tion. 

Definition 5. Suppose that we stand in the problem of density estima- 
tion, and let fij ^ be the coefficients defined in (11). Thus the density 
estimator by wavelet thresholding is given by 

j=l k 
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Where 

is the universal thresholding and j(n) is such that 

2 -j(n) < jog" < 2 -j(n)+l 

n 

Those estimators all belong to larger classes of estimation proce- 
dures, namely the classes of limited and elitist rules. 

Definition 6. Suppose that we stand in the white noise model (7). 
Let us consider the class of shrinkage estimators 

T n = {/n = ^2lj,kyj,ktpj,k; lj,k e [0, 1], measurable} . 

Thus we say that f n G T n is a limited rule if there exist a deterministic 
function X n and a constant a > such that for every j, k: 

7j, k > a =► 2~ j > X n . (19) 

In this case, one say that f n belongs to the class C(X n ,a). 

We say that f n G T n is an elitist rule if there exist a deterministic 
function \ n and a constant a > such that for every j, k: 

lj,k > a =>- \y jjk \ > A n - (20) 
Thus, f n belongs to the class S(X n ,a). 

One can easily see that linear estimators introduced in Definition 

2 are limited rules and that thresholding algorithms, hard and soft 
thresholding, or some Bayesian procedures with Gaussian prior are 
elitist rules. 

3 Statement of main results 

Let us recall minimax results in Besov spaces. Taking the LP norm, 
where 1 < p < oo, as the loss function, we know from [9], for the white 
noise model or in the case of density estimation, that the minimax 
lower bound in closed balls in Besov spaces in given by the following 
proposition. 
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Proposition 2. Let l<r<oo ; l<p<oo and s > Then, there 
exists C > such that 

R n (B s r >™) = mi sup E\\T n -f\\ p LP >Cr n (s,r,p) (21) 

where 



r n (s,r,p) 



2s+cP 

(22) 

Let us now check what is known concerning estimation procedures 
that we deal with. Although it is proved in [9] that thresholding 
procedures reach asymptotically the optimal rate up to a logarithmic 
correction, it is not always the case for linear procedures. As it can be 
seen in [8], with I? risk, linear estimators do not attain the minimax 
rate when studied functions have a sparse representation in a given 
base. In fact, the following proposition is proved in [8] which gives the 
optimal rate that can be reached in this case. 

Proposition 3. Let l<r<oo ; l<p<oo and s > ^. There exist 
C > such that 

Rlin(B s r ;n = _ inf sup E||T n - f\f LP > Cf n (8,r,p) (23) 
Tnhnear feBp™ 



where 



r n {s,r,p) = { P£_ (24) 



n 2s + d if r > p 

( n 
log 71, 

ands' = s-± + f } . 

We see in the following theorem that Proposition 3 remains true 
if we replace the risk maximum by the risk reached for almost every 
function. We also prove that in the same context Proposition 2 is true 
for thresholding algorithms up to a logarithmic term. We say in the 
following that a n w b n if -> 1. 

Theorem 1. Let 1 < r < oo, 1 < p < oo and s > -. Then, in the 
context of (4) or for the problem of density estimation: 

• For a suitable linear estimator f£ as in Definition 2 and for 
almost every function f in Bp°°([0, l] d ), 

E\\ti-f\\ LP *n- a P, (25) 
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where 



. *fr>P, 
a = { f-g+| , ( 26 ) 

• For almost every function in Bp°°([0, l] d ), and for thresholding 
estimator f£ 

sis) (27) 

' S— it r > P d 

2s+d 'J ' ^ 2s+d . . 

« = i (fzj+|) , ( 28 ) 

This theorem can be extended in term of shrinkage classes thanks 
to the following result. 

Theorem 2. Let l<r<oo ; l<p<oo and s > -. Then, in the 
context of (4) one has 

1. For limited rules, and for almost every function f in Br'°°([0, l] d ), 
for every C > 0, 

n\L-f\\%>Cn-^, (29) 

where 

ilird *fr>P, 

2. For almost every function in Br ,co ([0,l] d ), and for elitist rules, 
for every C > 0, 

nfn-nl^C^y" (31) 

where 

ifr > 



a = < 



2s+d "J ' ^ 2s+d , . 

, (32) 
— — 3-^- else. 

The previous theorems are stated in terms of Besov spaces and 
wavelet based estimation. Nevertheless, they can be easily extended 
to other function spaces, such as Sobolev spaces, thanks to an adapted 
basis. One can also notice that all of our results are given in terms of 
polynomial rate of estimation. 
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4 Proofs of Theorem 1 and 2 



Indeed, the proof of this theorem is based on the maximal space where 
an estimation procedure attains a given rate of convergence. For the 
sake of completeness, let us recall some basic facts upon the corre- 
sponding theory. 

4.1 Maxiset theory 

The maxiset theory introduced recently in [5, 16, 17] is an alterna- 
tive way to compare different estimation procedures. In our case, it 
provides a crucial key to prove Theorem 1. The main idea is to look 
for the maximal space on which an estimator will reach a given rate, 
instead of searching an optimal rate for a given space. 

Definition 7. Let p be a risk function and (v n )neN o, sequence such 
that v n — > 0. For' n a procedure, the maximal space associated to p, v n 
and a constant T is given by 



Several improvements were made in nonparametric theory thanks 
to this idea. For instance, it is shown in [5] that, for the density estima- 
tion model the thresholding procedure is more efficient than the linear 
procedure, whose maxiset is given in [15]. And in the heteroscedastic 
white noise model, [22, 23] shown that thresholding procedures are 
better than linear estimators and as good as Bayesian procedures. In 
the case of white noise model, we recall the following result which is 
a particular case of [22]. 

Proposition 4. Let 1 < p < oo, 1 < r < oo, s > ^ and a € (0, 1). 

Let f£ be the estimator given in Definition 2. 

For every f , we have the following equivalence: 
There exists c > such that for every n G N, 



if and only if f € B p ' . 

Before stating the result associated with thresholding algorithms, 
we define new function spaces closely related to approximation theory. 
Those spaces, weak Besov spaces, defined in [5] are subsets of Lorentz 
spaces, and constitute a larger class than Besov spaces. 




(33) 



E||/ n L - f\\l < cm- 



ap 



(34) 
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Definition 8. Let < r < p < oo. We say that a function f = 
Yjj,k c hki>hk belongs to W(r,p) if and only if 

S upA^2^^)^l {|Cjfc|>A} <oo. (35) 
A>0 k 

A fast calculation shows that the space W(r,p) contains Besov 
spaces B^'°° as soon as (3 > § (2 - 1). 

The maxiset associated with the thresholding estimation procedure 
is given by a weak Besov space as proved in [5] , and developed further 
in the heteroscedastic regression case in [16]. 

Proposition 5. Let l<p<oo,l<r<oo,s>^ and a G (0, l).Let 

fn be the estimator defined by (4) and (15). Then for every f we 
have the following equivalence: 
3K > such that Vn > 0, 

Ell/J -f\\l<K (Vnlog(n)^)- aP (36) 

if and only if f G B^ /2 '°° n W((l - a)p,p). 

Concerning shrinkage procedures, the following result from [1] also 
gives maxiset results for a large class of procedures. 

Proposition 6. Let 1 < p < oo and a > be fixed. 

• Let f n be a limited rule in C(X n , a), with a £ [0, 1[ and A„ a non 
decreasing continuous function. Thus 

MS(f n ,\\.r p ,X^)cB; >00 . (37) 

• Let f n be an elitist rule in £(\ n , a), with a £ [0, 1[ and X n a non 
decreasing continuous function. Thus 

MS(f n ,\\.r p ,X p n - a )^W(a,p). (38) 

Furthermore, another important key result involving Besov spaces 
is the following proposition from [10]. 

Proposition 7. Let us define the scaling function of a distribution f 
by 

Vp>0 Sf(p) = sup{s : / € Bp' 00 }. (39) 
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Let so and po be fixed such that sq — ^ > 0. Outside a Haar-null set 
in Bp°'°°(R d ), we have: 

•/W = {? + , "J -J ( 40 ) 

One can check that a lower bound of this scaling function is given 
by Besov embeddings and interpolation theory, which can be found in 
[24]. This result thus state that we cannot have a better regularity 
than the one given by those embeddings. In our case, we will ex- 
ploit this result by comparing those critical spaces with the maxiset 
associated to each procedure. 



4.2 Generic risk for linear estimators 

Let 1 < p < oo, 1 < r < oo and s > % be fixed. Denote 



s' 



d_d 
r p 



and 

s' 



In this section, we prove the first part of Theorem 1. We define the 

linear estimator as in Definitions 2 and 3. A bias- variance compromise 

i 

shows that for r > p, we have to take m n = n 2s + d whereas whenp > r, 



the bias and the variance are compensated when rrij = n 2(s >- + p )+£i . 

The upper bound is straightforward. Actually, from [7] we know 
that there exists c > such that for any n G N, and for any / G 
B S r '°°([0, l] d ), 

E||/ n L -/||£<m- Q ( s >. (42) 

Let us now check the lower bound. In a first time, we have to show 
that for every e > fixed, the set 

M(e) = {/ € ^([O, l] d ); 3c> Vn G N, E(||/ n L - f\f LP ) < cn -M*')+ £ ) P } 

(43) 

is a Haar null set. 
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Furthermore, with the particular form of m n taken, it coincides 
with the following set: 

{/ G B> r >°°([0, if); 3c> Vn G N, E(||/ n L - ff LP ) < cm^ s ' + ^} . 

But, by applying Proposition 4 we see that M(e) is included in B s p +e '°°([0 : 
And from Proposition 7, we know that this set is a Haar null Borel 
set of B?'°°([0, 

We thus obtain that Ve > 0, the set 

{/ G £ r s '°°([0, l] d ); 3c > Vn G N, E(||# - /||^) < cn -M s ')+^} 

is a Haar null set. This set can also be written, 



n— >oo 



— p log n 



Taking the countable union of those sets over a decreasing sequence 
e n — > 0, and the complementary we obtain that for almost every func- 
tion in Br'°°([0, l] d ), 

limiDf l0g(E(llff-/O < , 

n-+oo — p log n 

Which induces the expected result. 



4.3 Thresholding algorithms 

In this part, we take the estimation procedures /J given in Definition 
4 and Definition 5. 



Let us turn out our attention to the minimax rate of convergence. 
For this purpose, we write in the following 



a(s) 



2s 



2s+d 

2(«-g + f) 
I 2(«-£)+d 



if r > 
else. 



pd 
2s+d 



(45) 



The proof of the second point of Theorem 1 follows the same 
scheme as the previous one. In this case, the upper bound is given in 
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[9]. Thus we know that for every function in Bp°°([0, l] d ), and for all 
1 < p < 00, 

E(||/7-/||£ p )<c ' 



logra 

In order to prove the lower bound, we use Proposition 5. 

For every values of a, let < e < 1 — a be fixed, and M(e) be the 
set defined by 

M(s) = j/ G B°>°°([0, If); 3c > Vn G N, E(||/J - /||£ p ) < c^- 

Thanks to Proposition 5, this set M(e) is embedded in B p 2 n 
W((l — a — e)p,p). 

The end of the proof is based on the following proposition. 

Proposition 8. Let us define the weak scaling function of a distribu- 
tion f by 

Vp > §f(p) = sup{a : / G W((l — a)p,p)}. (46) 
Let s and r be fixed such that s — d > 0. Outside a Haar-null set in 



. 2s+d V ' ^ 2s+d 

Sf(p) = { 2(s-± + f) (47) 
— — -r-^r else. 

Proof: In order to prove Proposition8, let us prove that W((l — a — 
e)p,p) is a Haar null Borel set in Br'°°([0, l] d ). For this purpose, we 
define our transverse measure as the probe generated by the function 
g defined by its wavelet coefficients: 

2-( s -7+fb 2 -! J 

= j a (48) 

where a = 1 + ^ and < J < j and K G {0, . . . , 2 J — l} d are such 
that 

#4 < 49 > 



B?""(R a ), we have: 
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is an irreducible fraction. As it can be seen in [14], this function g 
belongs to Bp°°([0, l] d ). Let / G Bp°°([0, l] d ) be an arbitrary function 
and consider the afhne subset 

M = {a£R f + age W((l -a- e)p,p)}. (50) 

Suppose that there exist two points a\ and a 2 in M. Thus / + 
a i9 ~ (/ + c-iQ) belongs to W((l — a — e)p,p), and therefore there 
exists c > such that 

11/ + OL X g - (/ + a 2 g)\\w((l-a-6)p,p) = ll("l - ^2)g\\w{{l-a-e)p,p) < C 

(51) 

As a fast calculation shows that 

Va > 0, \\ag\\ w ^ p) = a r \\g\\ w ^ p) (52) 

we just have now to determine ||<?||w(r,p)- Thanks to equation (35), 
this is equivalent to determine for every t > 0, the value of 

2 -(i-,- £)pt ^ 2 ,(f- d )^ lKfc>2 _ t} 

j>0 k 

But by definition of g, we have, 

Ja >^^i S -- + ^ + -J<^ 

which implies that 

r . d d.r , . 

Note that the condition J > implies also that 

j(s-- + ^<t. (54) 
r 2 



We denote by t = — I — j and by t = — j. Thus we have, for every 



t > 0, 



i dp 



\9\\w«l-a-e)p,p) > 2-( 1 -«- £ > t sup 2^~ d ) Yl ^ 

o<i<* j=o 



ld^V»-7^2 
dp 



>2-a-*-*)i*sup sup 2^f^)^2 dJ , sup 2^f-^) £ 



°<i<7T¥ J=o ^T+i<i<< j=o 

d_d\ 



> sup f sup 2^(1 - 2^ d ), sup 2^~ d \2 rt 2-^-^ - 1) 



\0<jr'<t t<j<t 



17 



Merging this result with (51) together with (52), we obtain that, 
if there exist a\ and Q2 in M then they satisfy that for every t > 
and < j < t , 



|ai-a 2 | {1 " <5 " £)p < inf 



c 2(l-a-e)pt c 2(l-a~e)pt 



sup ^~2^|l - 2-^| sup^-tfC* -«0|2r*2-M-+f-f ) - l| 



(55) 

We have thus two cases: 

2s 

a 



2s + d 

But, if we take the first term, which satisfies 

sup 2~|1 - 2~ Jd | ~ 22=+d, 
o<j<f 

we have 

\ai - a 2 \ {1 - & - £)p < c2~ £pt . (56) 

• When r < > t nus as s > r we have necessarily p > 2 and 

2(s - ^ + 4) 
a = r - r ^-. (57) 

In this case, 

sup 2^t- d )\2 rt 2- jr ^ + i-^ - 1| ~ 2 2 "~V+ d . 

!<j<t 

And once again, 

Vt > |ai - a 2 | (1_,5 - £)p < c2- £pt . (58) 

As 1 — a — e > 0, it can be deduced from equations (56) and 
(58) that for t large enough, M is of vanishing Lebesgue measure and 
W((l — a — e)p,p) is an Haar null set in Bp 00 . □ 

Thanks to invariance under inclusion, we have obtained that for 
every e > 0, the set of functions / in Bp 00 such that 

r— — -(a(s)+e)p 

3c> Vn G N, E(||/J - f\\%) < c^j— (59) 

is a Haar null set. 

The end of the proof follows similarly as for linear estimators. 
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4.4 Shrinkage procedures 



The result of Theorem 2 is straightforward from Proposition 6 together 
with Propositions 7 and 8. 
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